Add Phase 3c run manifests and publication identity#855
Draft
Add Phase 3c run manifests and publication identity#855
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #860
Also fixes #854.
Summary
Implements the Phase 3c execution-ledger boundary by adding typed run and step manifests. Pipeline runs now write
run_manifest.jsonplus per-step JSON manifests under/pipeline/runs/{run_id}/steps/, with declared inputs, parameters, outputs, diagnostics, checksums, reuse decisions, attempts, timings, and failure information.Extends Phase 3c with publication identity plumbing across GitHub, Modal, and Hugging Face staging. GitHub now resolves a safe
publication_idbefore Modal starts; pipeline runs use that value as the common run namespace when provided; Modal app and volume names can be publication-scoped; run and step manifests record the publication context; HF staging uploads write tostaging/{publication_id}/...and include_publication_context.json.Changes the release workflow coupling so
push.yamlexplicitly dispatchespipeline.yamlwithworkflow_dispatchafter dataset build and versioning complete. The dispatched pipeline receives the sharedpublication_idplus the exact post-version-bumpsource_sha, andpipeline.yamlno longer passively launches fromUpdate package versionpush events.The pipeline now uses manifest-backed output validation for reuse decisions, records H5 scope fingerprints inside regional/national H5 step manifests, records partial H5 reuse counts, records data-build checkpoint hit/miss counts, and validates completed step outputs before release promotion.
Verification
uv run --no-sync pytest tests/unit/test_publication_context.py tests/unit/utils/test_data_upload.py tests/unit/test_release_manifest.py tests/unit/test_step_manifest.py tests/unit/test_modal_data_build.py tests/unit/test_pipeline.py -quv run --no-sync pytest tests/unit/test_publication_context.py tests/unit/test_pipeline.py -quv run --no-sync ruff check .github/scripts/resolve_publication_context.py .github/scripts/spawn_modal_pipeline.py policyengine_us_data/utils/publication_context.py policyengine_us_data/utils/data_upload.py policyengine_us_data/utils/release_manifest.py policyengine_us_data/utils/step_manifest.py policyengine_us_data/storage/upload_completed_datasets.py modal_app/pipeline.py modal_app/data_build.py modal_app/local_area.py modal_app/remote_calibration_runner.py modal_app/h5_test_harness.py tests/unit/test_publication_context.py tests/unit/utils/test_data_upload.py tests/unit/test_release_manifest.pyuv run --no-sync ruff check .github/scripts/spawn_modal_pipeline.py modal_app/pipeline.pyuv run --no-sync python scripts/run_quality_guards.pyuv run --no-sync python -m py_compile .github/scripts/resolve_publication_context.py .github/scripts/spawn_modal_pipeline.py policyengine_us_data/utils/publication_context.py policyengine_us_data/utils/data_upload.py policyengine_us_data/utils/release_manifest.py policyengine_us_data/utils/step_manifest.py policyengine_us_data/storage/upload_completed_datasets.py modal_app/pipeline.py modal_app/data_build.py modal_app/local_area.py modal_app/remote_calibration_runner.py modal_app/h5_test_harness.pyGITHUB_RUN_ID=123456789 GITHUB_RUN_ATTEMPT=1 GITHUB_SHA=abcdef1234567890 GITHUB_REPOSITORY=PolicyEngine/policyengine-us-data GITHUB_SERVER_URL=https://github.com GITHUB_WORKFLOW='Run Pipeline' GITHUB_REF=refs/heads/main GITHUB_REF_NAME=main MODAL_ENVIRONMENT=main US_DATA_MODAL_APP_PREFIX=policyengine-us-data-pub .venv/bin/python .github/scripts/resolve_publication_context.pygit diff --checkLocal full
uv runremains blocked on this Intel macOS environment because the lockedtorch==2.9.1wheel is unavailable formacosx_x86_64. For targeted tests, I manually installed missing local-only test dependencies into the worktree venv.